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Biological evolution in a sequence space with random fitnesses is studied within Eigen's qua- 
sispecies model. A strong selection limit is employed, in which the population resides at a single 
sequence at all times. Evolutionary trajectories start at a randomly chosen sequence and proceed 
to the global fitness maximum through a small number of intermittent jumps. The distribution of 
the total evolution time displays a universal power law tail with exponent -2. Simulations show that 
the evolutionary dynamics is very well represented by a simplified shell model, in which the sub- 
populations at local fitness maxima grow independently. The shell model allows for highly efficient 
simulations, and provides a simple geometric picture of the evolutionary trajectories. 



Biological evolution often displays a punctuated dy- 
namical pattern, in the sense that quiescent periods of 
stasis alternate with bursts of rapid change. A vari- 
ety of mechanisms for punctuation have been proposed, 
which operate on different levels of the tree of life. 
On the largest scales of macroevolution, coevolutionary 
avalanches may play a role, which have been associated 
with self-organized criticality [Q. On the level of pop- 
ulations, punctuation due to rare, beneficial mutations 
has been observed in evolution experiments with bacte- 
ria Q. Similar behavior has been found in simulations 
of RNA evolution, where stasis corresponds to diffusion 
on a neutral network, and a punctuation event marks the 
transition to another network of higher fitness Q| . 

Possibly the simplest interpretation of punctuated evo- 
lution is in terms of a homogeneous population, repre- 
sented by a localized distribution in some phenotypic or 
genotypic space, which evolves in a static, multipeaked 
fitness landscape Q . Under conditions of strong selection 
and small mutation rate, such a population will rapidly 
climb a local fitness maximum, where it then resides for 
a long time, until a rare, large fluctuation allows it to 
cross the valley to a more favorable peak. At least in the 
limit of infinite population size ||, the mathematics of 
this process is closely related to physical problems such as 
noise-driven barrier crossing, tunneling |^| and variable- 
range hopping 0, and it is easy to show that the resi- 
dence time at one peak can be vastly larger than the time 
required for the transition to the next In a rugged 
fitness landscape, the sequence of transitions forms an 
evolutionary trajectory, which probes the distribution of 
fitness peaks and the geometry of the landscape. 

In this Letter we investigate the statistics of such evo- 
lutionary trajectories in the framework of Eigen's quasis- 
pecies model P]To|] . We consider a population of individ- 
uals, each characterized by a binary genomic sequence 
a of length N, which reproduce asexually and mutate 
in discrete time t. The total number of sequences is 
S = 2 N . An individual with genotype a leaves A{a) 
offspring in the next generation, and point mutations oc- 
cur with probability /i per site and generation. In a mean 



field approximation, which neglects finite population ef- 
fects as well as nonlinear constraints on the population 
size, this leads to the evolution equations 

z{a,t + l) =J2 A {<r')f J > d{u, ' 7 ' ) ( 1 - lj) N ~ dM z{a',t) 

a' 

(i) 

for the number z(a,t) of individuals of genotype er, 
where d(a, a') denotes the Hamming distance between 
sequences a and cr', i.e. the number of symbols in which 
the two differ. 

The fitness landscape enters (|l|) through the choice of 
A(a). For single peaked landscapes, the model is known 
to exhibit the error threshold phenomenon, in which a 
population which is concentrated around the fitness peak 
- the quasispecies - becomes delocalized with increasing 
mutation rate, increasing sequence length, or decreasing 
peak height HQ. Here we want to study the sudden 
transitions between the different quasispecies associated 
with the multiple peaks of a rugged fitness landscape [jl2] . 
This punctuated behavior becomes more pronounced the 
more strongly the peaks are able to localize the pop- 
ulation. We therefore pass to a strong selection limit 
p3[ , which is motivated by the zero temperature limit 
of the equivalent problem of a pinned directed polymer 
pd| , p^ . To this end the reproduction rates are written 
as A(a) = e kF ( a \ where k is an inverse selective temper- 
ature |D| . In the limit k — > oo the population variables 
take the form z(a, t) — e kEt - (T ' t ' > and the mutation rate per 
generation has to be scaled down as fi — e~~ fk for some 
constant 7 > 0. Then (Q) reduces to 

E{a, t + 1) = max[E(a',t) + F(a') - 7 d(cr, a')]. (2) 

a' 

The fitnesses F(a) > are chosen randomly and in- 
dependently from a distribution p(F). The numerical 
simulations presented in this paper were performed with 
p(F) = e~ F , but results for other distributions will be 
described as well. The value of 7 is set to unity. 

Evolutionary trajectories are generated as follows. At 
time t = the population is placed at a randomly cho- 
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sen sequence crW ) corresponding to the initial condition 
E(aW,0) = 0, E(a ^ <7«, 0) = -co. Subsequently (|) is 
iterated, and the position cr*(t) of the global maximum 
of i£(o", i) is recorded; in the strong selection limit, essen- 
tially the whole population resides at a*(t) at time t. The 
trajectory a*(t) diplays the expected punctuated pattern 
p3[ . After the first few time steps, it passes exclusively 
through local fitness maxima. The transitions between 
different local maxima are abrupt, and the time period 
spent at a given peak, as well as the probability for a 
peak to be visited by a trajectory with random starting 
point, increases strongly with its fitness. The end point 
of each trajectory is the global fitness maximum of the 
landscape. 

Here we focus on two statistical measures of the evo- 
lutionary trajectories: The distribution P(T) of times 
T required to reach the global maximum, and the dis- 
tribution P(n) of the number n of jumps between local 
maxima that occur on the way. In Figs .|l and|| numerical 
results for sequence lengths A = 8 - 17 are shown. The 
data were averaged over 10 5 trajectories, each evolving 
in a separate, independently generated fitness landscape. 
To make the problem numerically tractable despite the 
exponential growth of sequence space with increasing N, 
these simulations were carried out using a local version of 
the recursion relation Q) , in which the maximization on 
the right hand side is performed over nearest neighbor 
sequences with d(a, a') < 1 only. This implies that at 
most one point mutation is allowed per generation. The 
full dynamics (0) and the local version will be referred to 
as the global and the local model, respectively. The dif- 
ferences in behavior that arise due to the local restriction 
will be explained below. 

Notable features of the numerical results in Fig|l] in- 
clude (i) a (roughly linear) increase of the typical evolu- 
tion time with A, as represented e.g. by the maximum 
of P(T), and (ii) a power law tail P(T) ~ a(N)T- a with 
a w 2 for T ^> A. A careful analysis of the data for 
different sequence lengths shows no systematic depen- 
dence of a on N, and yields the overall best estimate 
a = 2.13 ± 0.04. If it is assumed that a = 2 exactly 
(see below), then the coefficient a (A) of the power law 
scales with sequence length as a(A) ~ A 1 - 7 . The distri- 
bution of the number of jumps in Fig.|^ is well fitted by 
a Gaussian. The mean number of jumps n is small, and 
the variance An 2 < n, indicating sub-Poissonian fluctu- 
ations. In the (admittedly small) range of A accessible 
to our simulations, the dependence on sequence length is 
reasonably well described by power laws, n w 0.56 • A 73 



and At 



0.17 • A - 74 . 



To gain some insight into these results, we now in- 
troduce an approximation to (^) which allows for some 
analysis, as well as for a much more efficient numerical 
scheme. The evolutionary dynamics (Q) and (^]) involves 
two distinct processes: The spreading of the population 
through sequence space by mutations, and the (expo- 



nential) growth of the local sub-populations by selection. 
Our key assumption is that the spreading part of the dy- 
namics is important only in the initial stages of evolution, 
while the late stage can be described as a competition 
between independently growing quasispecies associated 
with different local fitness maxima. 

More specifically, after the first time step the recursion 
relation (|J) with the initial condition localized at crM 
yields a (logarithmic) population distribution E(a, 1) = 
F(o~^) — jd(a, ctW) which is symmetric around and 
linearly decreasing with increasing distance from crW . 
Thus at t = 1 every sequence is seeded with a (possi- 
bly astronomically small) population. The approxima- 
tion consists of letting these populations evolve indepen- 
dently for t > 2, i.e. to replace (||) by the simple linear 
growth law 



E(a,t) = E(a,l) + {t-l)F(a). 



(3) 



Since E(a, 1) is constant on the shells of constant Ham- 
ming distance d(cr,a^) = k to the initial population, 
for each shell only the sequence with the largest fitness 
needs to be taken into account. The A shell fitnesses can 
be generated directly from the distribution of the largest 
among (, ) exponential random variables, corresponding 
to the (^) sequences contained in the shell. In this way 
the shell approximation vastly reduces the computational 
effort from 2 N to A. Fig.[| shows that the shell approx- 
imation works astoundingly well. For sequence lengths 
A = 8 and 10, for which a comparison is possible, the 
distributions P(T) and P(n) obtained (with little effort) 
using the shell approximation arc indistinguishable from 
those generated by the full model (j|) with global maxi- 
mization. 

The situation is more complicated for the local model 
used in Figs.0 and g, because in that case the seeding 
stage takes A time steps. This explains immediately the 
appearance of a typical evolution time of order A, which 
is absent in the global model: On average, it takes A/2 
time steps before the global fitness maximum has even 
been seeded. The seeding time for the shell at distance 
d(a, it") = k is k. The shell approximation can be im- 
plemented only for times t > k, where linear growth ac- 
cording to E(a,t) = E(a,k) + (t - k)F(a) sets in. This 
is however less useful than (||), because the computation 
of E(o~, k) already requires the solution of a nontrivial 
optimization problem: The population strength at the 
"seeding front" defined by d(cr, <jW) — t is given by 



E(a,t) 



•7*, 



(4) 



where n denotes a directed path of length t from to 
a. A reasonable shell approximation for the local model 
(which is however not as accurate as in the global case) 
is obtained by treating the first term on the right hand 
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side of (0) as a constant, setting Eia, k) = —jk in shell 
k. 

Using (||), the total evolution time T can be estimated 

as 

7 [rf( CT (/>y'))~ri(q(/- i )y'))] 

F((t(/))-F((t(/-i)) ' lj 

where denotes the global fitness maximum, and 

the last sequence visited before the global maxi- 
mum is reached. We expect that a^-D is close in fitness 
to the sequence with the second largest fitness in the sys- 
tem, and hence the denominator in (^|) is comparable to 
the fitness gap e of the landscape, which we define as the 
difference between the largest and the second largest of 
the S fitness values. To estimate the numerator of (||), 
we note that most sequences reside in a belt of thickness 
~ y/~N around the mean distance N/2 from erW, and 
hence d(a {f \a^) - e^- 1 ), cr») ~ \/N. We conclude 
that T ~ y/N /e. The inverse correlation between T and 
e for a given landscape has been explicitly verified in the 
simulations fl6[| . 

The distribution P g (e) of the fitness gap can be com- 
puted from order statistics J17J . The distribution of evo- 
lution times then becomes P(T) ~ (\fN /T 2 )P g (\fN /T), 
which displays a power law tail with exponent a = 2 
for T > \/N and prefactor a(N) ~ VNP g (0). The 
T -2 -decay is universal, because P g (0) is always finite 
and nonzero [fl3[ . ft follows from the explicit expression 
that P g (0) ~ (AF max (5))~ 1 , where AF max (S) denotes 
the standard deviation of the maximum -fm ax (»S # ) among 
the S independent fitness values. For the exponential 
fitness distribution P g (e) — exp(— e) independent of N, 
so that a(N) ~ v^V- In contrast, for fat tailed fitness 
distributions such as power laws, a(N) decreases with 
increasing N. These predictions are well confirmed by 
simulations of the global shell model. 

Repeating the argument for the shell approximation 
to the local model described above, one finds P(T) ~ 
a{N)T~ 2 with a(N) ~ \/N(F max (S))P g (0). For the 
exponential distribution (F max (S)) = lnS* ~ N, hence 
a(N) ~ iV 3 / 2 , which is consistent with the numerical es- 
timate quoted above. We conclude that the asymptotic 
value of the evolution time exponent is a = 2 in all cases, 
and attribute the deviations found in the data in Fig.j^to 
short time corrections originating in the seeding stage. 

Within the shell model, the problem of determining 
the number of jumps along an evolutionary trajectory 
has a simple geometric interpretation. Eq.(^) defines a 
family of N straight lines in the (t, F)-plane, one for each 
shell. The intercept of the line corresponding to shell k 
is F(a^) — jk and its slope is the shell fitness F k , i.e. 
the largest among ( fe ) independent fitness values. The 
evolutionary trajectory a*(t) then corresponds to the up- 
per envelope of this family of lines, and the number of 
jumps is the number of corners of this random polygon. 



ft follows immediately that the number of jumps is al- 
ways smaller than TV. To go beyond this trivial bound 
is not easy, because the shell fitnesses are not identically 
distributed, i.e. the distribution of Fk depends on k. For 
large N the global maximum is certain to lie near the shell 
k = N/2, so that n < N/2. If one assumes that only se- 
quences with fitnesses comparable to the the global max- 
imum, which reside in the belt around fc = N/2, con- 
tribute to the trajectory, then n ~ y/N. Simulations of 
the global shell model yield n ~ iV 60 and An 2 - N - 72 
for 4 < N < 29, but the data are not inconsistent with a 
linear dependence on 

More progress is possible for a simplified shell model in 
which the shell fitnesses are identically distributed (this 
corresponds to a one-dimensional sequence space). Sup- 
pose that at some time t the population resides in shell k. 
The next jump then occurs to the shell k' > k for which 
(i) F^ > F k and (ii) the time t(k, k') = -y(k' -k)/(F k -F k ) 
at which the two lines cross is minimal among all such 
k! . If the second requirement is ignored and one simply 
choses the next shell with k' > k and F' k > Fk, the prob- 
lem reduces to that of record dynamics, for which the 
number of jumps is Poisson distributed with mean IniV 
fLSfl . Since the greedy record algorithm evidently pro- 
duces more jumps than the original dynamics, the mean 
number of jumps in the uniform shell model is bounded 
from above by IniV, independent of the fitness distribu- 
tion. Simulations of this model indicate that in general 
n ~ f31nN, where the coefficient (3 < 1 depends on the 
extremal statistics of the fitness distribution. Specifically, 
we conjecture that (3—1/2 for distributions similar to 
an exponential, (3 — (5 — 1)/(2S — 1) for power law dis- 
tributions p(F) - F~( s+1 \ and f3 = (2 + + 2v) for 
bounded distributions with p(F) ~ (F — F) v , F < F . 

In conclusion, we have explored the statistical prop- 
erties of evolutionary trajectories in a sequence space 
equipped with a rugged fitness landscape. The simple 
but highly accurate shell approximation elucidates the 
interplay of the geometry of sequence space and the ex- 
tremal statistics of fitness values, and allows for efficient 
numerical investigations. It seems worthwhile to extend 
these studies to correlated fitness landscapes and differ- 
ent types of graphs, with the aim of using evolutionary 
trajectories as a means to characterize the geometry of 
general optimization problems. 
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FIG. 2. Distribution of the number of jumps for the local 
model, shifted and scaled by its mean and standard deviation. 
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FIG. 1. Distribution of evolution times for the local model. 
The data have been binned exponentially, hence the measured 
exponent is a — 1. 
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FIG. 3. Comparison between the full global dynamics and 
the shell model for sequence length N = 10. The number of 
realizations was 10 6 for the full model and 10 7 for the shell 
model. Upper panel: Binned distribution of evolution times. 
Lower panel: Distribution of the number of jumps. 
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